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This technology report describes the longitudinal registration approach that we intend to 
incorporate into SPM12. It essentially describes a group-wise intra-subject modeling frame- 
work, which combines diffeomorphic and rigid-body registration, incorporating a correction 
for the intensity inhomogeneity artifact usually seen in MRI data. Emphasis is placed on 
achieving internal consistency and accounting for many of the mathematical subtleties that 
most implementations overlook. The implementation was evaluated using examples from 
the OASIS Longitudinal MRI Data in Non-demented and Demented Older Adults. 
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1. INTRODUCTION 

Growth, plasticity, aging, and degeneration are inherently lon- 
gitudinal processes; while much can be learned by studying a 
cross-sectional sample of subjects at different stages of such 
processes, longitudinal data have well-established advantages in 
terms of increasing power and reducing confounds. To give 
just one example, cross-sectional studies of aging are chal- 
lenged by the relative subtlety of changes over time com- 
pared to the large inter- individual variation, and they can never 
fully separate true aging effects from confounding effects of 
birth-year such as changes in nutrition. Longitudinal anatom- 
ical MRI provides a framework for characterizing many of 
the macroscopic brain changes in natural development and in 
response to disease or injury. A MEDLINE search for key- 
words "longitudinal," "brain," and "MRI" in the topic, gives over 
1,500 hits. 

Longitudinal data require appropriate statistical models, due to 
the dependence of repeated measurements within-subject. Beyond 
this, it is also recognized that there can be substantial gains in 
power from using image processing and modeling algorithms that 
are specifically designed for longitudinal data. For example, the 
boundary shift integral (Freeborough and Fox, 1997; Leung et al., 
2012), which provides a direct measure of the difference between 
two brain volumes, has proven to be substantially more power- 
ful than simple subtraction of the two brain segmentations upon 
which it is based. However, the reduction in measurement vari- 
ability comes at a price of increased risk of various forms of bias. 
In particular, any technique that is not symmetric 1 with respect 



1 Following Tagare et al. (2009) we prefer the term "symmetric" to alternatives such as 
"inverse-consistent" because of its formal mathematical definition, which includes 
not only invariance to swapping a pair of images, but also invariance to permutations 
of more than two images. 



to the multiple time-points has the potential to introduce false 
positive differences. 

Concerns about asymmetry in pairwise image registration, 
related to one image being chosen as a (fixed) reference and the 
other as a (moving) source, first arose at the turn of the cen- 
tury (Ashburner et al., 1999; Cachier and Rey, 2000; Christensen 
and Johnson, 2001). In particular, Smith et al. (2001) discuss the 
potential use of the matrix square root of an affine transformation 
to derive a "half-way" space between two images 2 . This concept 
has since been generalized to more than two images, as described 
further below. Despite this early recognition of the potential 
problem, many (probably most) studies employing image reg- 
istration for longitudinal data over the past decade have used 
methods with some form of asymmetry [usually registering the 
later time-point(s) to the baseline]. 

More recently, it has been empirically demonstrated that these 
theoretical concerns can indeed cause practical problems. Pair- 
wise registration results have been shown to differ depending 
on which image is chosen as the reference (Thomas et al., 2009; 
Yushkevich et al, 2010). Results over three time-points appear 
to exhibit an additive bias, such that the majority of change 
occurs over the first interval, in a setting where such deceler- 
ation is not biologically plausible (Hua et al., 2011; Thomp- 
son et al., 2011). Intransitivity (where, for example, the sum of 
changes from A to B and B to C differs from the change esti- 
mated directly from A to C) can also be demonstrated (Leung 
et al, 2012). The increasingly significant potential of longitudinal 
MRI for clinical diagnosis, as a biomarker of disease progres- 
sion and as an outcome measure in treatment trials, has led to 
an increased focus on this issue (Fox et al, 2011; Reuter and 



See also the FSL SIENA tool, www.fmrib.ox.ac.uk/fsl/siena/index.html 
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Fischl, 2011), along with on-going controversy (Holland et al., 
2012). 

Many of the solutions proposed for avoiding bias from asym- 
metry are restricted to two time-points, such as the formulation 
of Tagare et al. (2009). There are three distinct approaches for 
attempting to address asymmetry and intransitivity across more 
than two time-points: all time-points of all subjects can be treated 
independently, i.e., processed cross-sectionally (e.g., Giedd et al., 
1999); each time-point can be registered to every other time-point 
within -subject, with appropriate adjustments to ensure consis- 
tency (e.g., Leung et al., 2012); or all time-points can be registered 
to some form of within -subject average image (e.g., Skrinjar et al., 
2008; Reuter et al., 2012a). The first of these results in low or zero 
bias but very high variance, while the second is computationally 
infeasible for high- dimensional diffeomorphic image registration. 
We therefore propose a solution based on the third approach. Reg- 
istration to an average is defined as an "indirect" approach by 
Tagare et al. (2009), in contrast to "direct" approaches which sym- 
metrize an individual pairwise registration; however, in this work, 
we attempt to address some of the more subtle differential geo- 
metric issues raised by Tagare et al. (2009) in their direct setting. A 
further novelty of our approach is the unification of rigid registra- 
tion, diffeomorphic registration, and correction of (differential) 
intensity inhomogeneity in a single generative model (in the same 
spirit as Ashburner and Friston, 2005). The algorithms described 
in this Technology Report will be made available in the Statis- 
tical Parametric Mapping (SPM) 3 software with the next major 
release, SPM12. 

2. METHODS 

Our proposed model involves combining rigid-body registration, 
intensity inhomogeneity correction, and non-linear diffeomor- 
phic registration. Because of all the conditional dependencies 
among model parameters, naive approaches involving pipelines of 
steps are likely to lead to less optimal solutions. For example, accu- 
rate inhomogeneity correction requires the images to be aligned, 
whereas more accurate alignment is only possible after inhomo- 
geneity correction. Our solution involves performing all steps in 
an interleaved fashion, such that model parameters are optimized 
together. Rigid-body and diffeomorphic registration are combined 
because the boundary conditions of the diffeomorphic registra- 
tion do not permit rotations of the whole image. In addition, 
the inclusion of translations in the rigid-body model allows the 
diffeomorphic model to not penalize pure displacements. 

In the remainder of this section, the individual components 
will be introduced separately, before proceeding to describe the 
combined model and some of the implementational details. First 
though, we say something about how the within -subject template 
space is defined. 

2.1. TEMPLATE POSITION AND DIMENSIONS 

The SPM software uses the concept of voxel- to -world mappings. 
These are encoded in the image headers, and are used to determine 
the real world locations (y) of voxel indices (x). Usually, they are 



5 Available from www.fil.ion.ucl.ac.uk/spm/ 



read from the "sform" fields of the NI1TI header 4 , and encode 
affine transformation matrices (M), such that 
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Each of the individual images is assumed to have such a voxel- 
to-world mapping associated with it (M„). The first step of the 
algorithm involves computing suitable dimensions and a voxel- 
to-world mapping (M M ) for the within-subject template. The aim 
is to have this template in some form of average position such that 
bias introduced by interpolation is minimized. 

As mentioned above, some pairwise registration approaches 
have suggested using square roots of transformation matrices for 
determining the half-way position for approximately symmetriz- 
ing the registration (Smith et al., 2001; Thomas et al., 2009; Yushke- 
vich et al., 2010). Several forms of transformation are members of 
matrix Lie groups (Woods, 2003), so it is illustrative to consider 
one of the simplest Lie groups - the positive scalars under multi- 
plication. As emphasized in Leung et al. (2012), the conventional 
square root of a positive scalar can be seen as the geometric mean 
of the scalar and unity (the identity for positive scalars). The sim- 
ple geometric mean of a set of points can be shown to minimize 
the sum squared distances from the mean to all points under a 
logarithmic distance metric; this property defines a Frechet mean, 
sometimes known as a Karcher mean, where the latter is only 
required to be at a local minimum (Woods, 2003). The geomet- 
ric mean is also an exponential barycenter (Pennec and Arsigny, 
2012), in the sense that the signed logarithmic discrepancies from 
the mean to all other points sum to zero. Considering now the Lie 
group of three-dimensional rotation matrices [known as SO(3), 
the special orthogonal group], an appropriate Riemannian dis- 
tance metric and Frechet mean (also the exponential barycenter) 
can be analogously defined (Moakher, 2002). Unfortunately, in 
the more general groups of rigid-body or affine transformations, 
it can be shown that no bi-invariant Riemannian metric exists 5 . 

Three different compromises have been used in the litera- 
ture to circumvent the lack of bi-invariant metric: Reuter et al. 
(2012b) compute a simple Euclidean mean of rigid-body transfor- 
mations before using the singular value decomposition to factor 
out any resultant zooms and shears; (Leung et al., 2012) use 
Arsigny 's log-Euclidean mean (Arsigny, 2006), which is a distance- 
minimizing Frechet mean under a sub -optimal metric that is not 
bi-invariant; (Woods, 2003) use the exponential barycenter (Pen- 
nec and Arsigny, 2012) which corresponds to a bi-invariant mean, 
relaxing the manifold to a semi- Riemannian one without a true 
metric (and hence without a Frechet mean), but still having a set 
of tangent space vectors from the mean that sum to zero. 



4 See http://nifti.nimh.nih.gov/nifti- 1 

5 Bi-invariance requires that composing both transformations with a third should 
not change their distance, and that inverting both should also preserve the distance; 
in the simple positive scalar case, the logarithmic distance between 1/2 and 1/4 is 
the same as that between 1/6 and 1/12, and the same as that between 2 and 4. 
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Our implementation is also based on the exponential barycen- 
ter, which in theory could be directly used to define the average 
position template. Unfortunately, because voxel sizes are not nec- 
essarily isotropic, there are some potential difficulties with using 
this. In particular, it is possible for the exponential barycenter to 
be a transform that can not be encoded by axis-aligned scaling (to 
account for voxel sizes), rotating, and translating (i.e., nine para- 
meters). It would not be possible to encode such a matrix in the 
"qform" fields of a NlfTI header, and it would also make some 
of the subsequent processing slightly more involved. Therefore, 
the closest nine -parameter affine transform to the exponential 
barycenter is estimated via a Gauss-Newton optimization strategy 
that minimizes the Euclidean distance between the matrices. 

Finally, the dimensions of the template are determined such 
that it easily covers the field of view of the images. This is achieved 
by projecting the locations of the corners of the images into the 
template space and finding the maximum and minimum val- 
ues. Corresponding adjustments are also needed for M M , which 
involves changing the translations. 

Once the dimensions, etc., of the template are defined, the algo- 
rithm can proceed to fit the combined model. The next three 
subsections outline the individual components, and how they 
would be optimized in isolation. This will be followed by a subsec- 
tion about combining the components together into the unified 
model. 

2.2. GROUP-WISE RIGID-BODY REGISTRATION 

Each image is considered as a scalar function of space, such that 
f n : Qf n -> R, where Qf n C R 3 . A series of N images {f n }„ = i may 
be modeled as a common template mean (/x : —> R), which 
is rotated and translated by a rigid-body transformation. Additive 
Gaussian noise (of variance II X n ) is assumed to be constant over 
each image and known, but may vary from image to image. 

When dealing with rigid-body transforms, it is useful to con- 
sider them in terms of their membership of the special Euclidean 
group in three dimensions [S£(3)]. Within this framework, a 
transformation matrix (R q ) is constructed via an exponential 
mapping of the six parameters (q) that constitute the Lie algebra 
of the group. 



Rn 



exp 



o ^4-^5 qi 

-q 4 0 q 6 q 2 
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This involves a matrix exponential, rather than computing the 
exponential of each element. Although there are many ways to 
compute this (Moler and Van Loan, 2003), it is usually defined as 



expQ = £-Q" 



(3) 



n=0 



Rigid alignment among a set of images can be performed by 
estimating the optimal mapping between the template and each 
of the images. A mapping from voxel indices in the template, to 
those in the nth image, is given by the following affine transform: 



§ qn (x) = fe^M-^M^ 



where 13,4 = 



1 0 0 0" 
0 10 0 
0 0 10 



(4) 

This leads to the following objective function for group-wise 
rigid-body alignment: 



S 



£y||/«-KC 
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(5) 



Without constraints, this simple approach may lead to a vari- 
ety of equivalent solutions in which the template is rotated and 
translated by some arbitrary amount. The Lie algebra of the SE(3) 
group is used to parameterize the transforms to make is easier 
factor out the Frechet mean of all the rigid-body transforms, thus 
ensuring that the template remains in the average position. All that 
is required to achieve this is to ensure that ^ „=i = 0> which 
can be achieved simply by subtracting the mean after all the q n 
have been re-estimated. 

For reasons that should become apparent later, we formulate 
the objective function as 



N 

n=l JxeQ' n 
N 

such that — 0- 



(6) 



n=l 



In the above equation, the D operator refers to computing the 
Jacobian, the determinants of which are included to account for 
a change of variables. The field of view common to both the nth 
image and the template is denoted by Q! n = (Qf n ) H Q^. 

The optimization strategy involves alternating between re- 
estimating the mean (/x), and then sequentially using this to re- 
estimate the registration parameters. The update of [i is achieved 
by differentiating equation (6) with respect to \i and solving. 



/x(x) = 



, where 



w„ (x) 



X n \Di; 
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ifxe Q' n 
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(7) 



A Gauss-Newton optimization iteration is then done for each 
of the N scans. Dropping the n subscripts for notational simplicity, 
this involves the following: 



(8) 



It requires the vector of first derivatives, which are computed 
by differentiating equation (5) with respect to q, and applying the 
same change of variables as earlier (see Appendix A), giving: 
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where a = AID£ q l(/(£q) — /x),g = an d 



hi (x) = ^M^RT 1 9Rq 



m q 



(10) 



For a more stable algorithm, the matrix of second derivatives 
(Hessian matrix) should be positive definite. A suitable positive 
definite approximation to the Hessian (see Appendix A) is: 



2 -^~- [ w (x) (g (x) • hi (x)) (g (x) • hj (x)) dx. 



(ID 



Our implementation makes use of obtained by differenti- 
ating equation (3), although this could also have been computed 
numerically using finite differences or by more elegant methods 
(Al-Mohy and Higham, 2009). The overall optimization scheme is 
presented in Algorithm 1. 

Algorithm 1 | Rigid-body registration. 

for n = 1...N do 

Initialize q n . 
end for 
repeat 

Compute fi and V/x (equation 7) . 
for n = 1...N do 

Compute 1st derivatives (equation 9) . 
Compute Hessian (equation 11) . 
Gauss-Newton update of q n (equation 8) . 
end for 

4 *~ N 2^n=l 4/i- 

for n = 1...N do 
q« +- q n - q. 
end for 
until convergence 



2.3. GROUP-WISE INH0M0GENEITY CORRECTION 

MRI scans are usually corrupted by a spatially smooth intensity 
non-uniformity (also known as "inhomogeneity" and sometimes 
referred to as "bias"), which is often corrected prior to image reg- 
istration using a procedure such as N3 (Sled et al, 1998). Instead, 
we propose that the additional internal consistency from incor- 
porating the non-uniformity correction within the longitudinal 
registration scheme should provide potentially more accurate 
results. Studholme et al. (2004), Modersitzki (2006), and Modat 
et al. (2010) previously used registration-based non-uniformity 
corrections, although those approaches did not consider aspects 
of inverse (or group-wise) consistency. 

Here, we outline a general strategy for dealing with non- 
uniformity fields in aligned images. This part of the model assumes 
that a series of N aligned images (f n ) maybe modeled as a common 
template mean (/x), scaled by non-uniformity fields. Because these 
fields should be positive, they are modeled using an exponential of 



a Gaussian process (e bn ), where b n : £2f n 
to the following objective function. 



L This model leads 
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Note that L b is a differential operator that penalizes the rough- 
ness of the (logarithms of the) estimated non-uniformity fields. 
A Laplacian is used in practice, although the optimal choice of 
differential operator will depend on the nature of the artifacts in 
the image data. 

Regularized maximum likelihood (or maximum a posteriori) 
optimization of the inhomogeneity fields may be achieved by 
minimizing the above function. As in the rigid registration case, 
we propose alternating between re-estimating the mean and then 
using this to re-estimate the inhomogeneity fields. 

The update of \x is achieved by differentiating equation (12) 
with respect to /x, and solving. 



EN n r 

n=l A njn e 

L«=i x ne- 



2b n 



(13) 



Gradient descent could then be used to update the inhomo- 
geneity fields, which requires the first derivatives of the objective 
function. These are computed via their Gateaux differential. 



dS (b n ; h) 



d 1 

dr 2 



T=0 



= / 



K\\fn-Lie b «+ Th f + \\L b (^ + r /z)|Q 
On (x) [i (x) h (x) dx + [l\ L b b n ,1tj (14) 



where a n = X n e bn (f n — fie bn ) . 

In our implementation, the fields are updated via a Gauss- 
Newton step, which makes use of a positive definite approximation 
to the second derivatives (derived using similar principles to those 
in Appendix A). 



d 2 8 (b n ;hi,h 2 ) 



dx\dx2 2 



2 \f n 



lie 



b n + T\h\+T2h2 



+ ||L 6 (^ + Ti/Zi + T 2 /Z 2 )|Q 

~ / w n (x) fi(x) 2 hi (x) h 2 (x) dx + Ul^hu h 2 
JxeQu x 



Tl =0,T 2 =0 



(15) 



2b n 



where w n = X n e 

The overall procedure is summarized in Algorithm 2, which has 
a similar overall structure to Algorithm 1. The non-uniformity 
fields are encoded by a parameter at each voxel, such that con- 
tinuous representations [of b(x)] may be obtained via tri-linear 
interpolation. This parameterization results in a diagonal matrix 
for the first term in equation (15), making it relatively straight- 
forward to solve the system of linear equations required for the 
update via a full multi-grid (FMG) method. 
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Algorithm 2 | Non-uniformity field estimation. 

for n = 1...N do 

Initialize b n . 
end for 
repeat 

Compute fi (equation 13) . 
for n = 1...N do 

Compute 1st derivatives (equation 14) . 
Compute Hessian (equation 15) . 
Gauss-Newton update of b n (via FMG) . 
end for 

for n = I...N do 

b n <- b n - b. 
end for 
until convergence 

Although the model appears to incorporate a number of un- 
necessary parameters (i.e., N non-uniformity fields plus a mean 
image), it effectively involves only N — 1 fields. The mean image 
is incorporated for convenience, and the geometric mean over 
all the estimated non-uniformity fields converges to one (given 
appropriate regularization). In practice, this constraint may be 
incorporated in the optimization (as shown in Algorithm 2) , which 
enhances convergence. When there are two images, we may assume 
^2 = — b\, in which case the objective function reduces to 



XiX 2 f (fi(x)e b ^ -f 2 (x)e- b ^) 2 



2 L 



(16) 



This special case may have other applications, such as the compu- 
tation of smooth ratios of MR scans for mapping of RF transmit 
fields (Lutti et al., 2010). 

2.4. GROUP-WISE DIFFE0M0RPHIC REGISTRATION 

From a generative modeling perspective, the objective function for 
group-wise registration may be written as 

£ = E(yll/«-M°0;„ 1 |) 2 + ^|Lv„v„| 2 d7) 

n=l ^ ' 

where 0 Vn : £2 M —> £2^ is a diffeomorphic mapping, constructed 
from the initial velocity field v n : —> R 3 . Each image (f n ) is 
assumed to be a warped version of a template (/x), with added 
noise. Smoothness of the diffeomorphisms are achieved via the 
differential operator Ly, which may differ from image to image. 

It is now reasonably well known that a diffeomorphism (0 V ) 
may be computed from an initial velocity field (v) using a proce- 
dure known as geodesic shooting (Miller et al., 2006). Essentially, 
this procedure is based on integrating a particular form of dynam- 
ical system over unit time, and relies upon the principle of con- 
servation of momentum. The procedure begins by initializing 0 V 



to the identity transform and computing the initial momentum 
from the initial velocity via 



u = L v LyV. 



(18) 



Then the following dynamical system is integrated over unit 
time. 

0 v =(k v (|D0; 1 |(D0; 1 ) T (uo0; i )))o0 v (19) 

Briefly, the initial momentum is re-sampled according to the 
inverse of the current estimate of 0 V . Each point in the resulting 
field is matrix-multiplied by the transpose of the Jacobian tensor at 
that point of 0" 1 and rescaled by the determinant of the Jacobian. 
Finally, the result is smoothed by applying the K v operator, which 

is the Greens function of L^Ly (see Bro-Nielsen and Gramkow, 
1996), to give the velocity field that provides the next update for 
0 V . The K v operator maybe viewed as a low pass filter, which is the 

(pseudo-) inverse of L^Ly, such that K v L^LyV = v. In practice, we 
use a slightly different integration scheme, which was described in 
Ashburner and Friston (2011). 

We now re-write Equation (17) via integration by substitution. 



N x r 

Nv„ W I (fn o 0 Vn (X) - IX (X)) 2 dx 
n=l 1 JxeQ ^ 

1 N 

+ ^I]l L v« V «| 2 



(20) 



n=l 



The template update equation is obtained by differentiating 
equation (20) with respect to /x and solving. 



M = 



V N 1 1 

2^n=l A n \ 




(fn c 




2-^ n= 


1 X-n | 







(21) 



Registration is treated as an optimization procedure, using 
both first and second derivatives. The first Gateaux differential 
is computed via 

dS (v„; h) = / a n (x) g (x) h (x) dx + /l^Lv w v„, h\ (22) 

where a n = X n |D0 Vn | (f n o </) Yn — fij . The template gradients (g) 
used for the registration are computed (see Appendix B) as follows. 



g = 



J2n=l X w |D0 v JV(/nO0 v J 



(23) 



A Gauss-Newton step is used to update the estimates of the ini- 
tial velocity, which also requires a positive definite approximation 
to the second derivatives. 

d 2 8 (v B ; hi, h 2 ) - f w n (x) (g (x) • h : (x)) (g (x) • h 2 (x)) dx 

+ (^^1^2) (24) 
where w n = X n |D0 V I . 
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Further practical details about the implementation may be 
found in Ashburner and Friston (201 1), with some explanation of 
how the system of linear equations are solved via a full multi-grid 
approach in Ashburner (2007). 

Algorithm 3 | Diffeomorphic registration. 

for n = 1...N do 

Initialize v n . 
end for 
repeat 

for n = 1...N do 

Compute 0 Vn and D0 Vn from v n using 
geodesic shooting, 
end for 

Compute mean and gradients 
[equations (21) and (23)]. 
for n = 1...N do 

Compute 1st derivatives (equation 22) . 

Compute Hessian (equation 24) . 

Gauss-Newton update of v n (via FMG) . 
end for 

(4„Lv„)vn- 

for n = 1...N do 

v n <r- y n - K Vn u. 
end for 
until convergence 



2.5. COMBINING THE COMPONENTS 

The registration procedure combines group-wise rigid-body and 
diffeomorphic registration with intensity non-uniformity correc- 
tion. The model assumes that each image (f n ) is a deformed version 
of a template image (/x), scaled by the exponential of an inhomo- 
geneity field (b n ), with a known amount of additive i.i.d. Gaussian 
noise (precision X n ). Fitting this model (see Figure 1) involves 
minimizing the following objective function: 

N 1 / f 2 

£ = E 2 / Xn \ D<Pn ^1 ( f * (x) " M (x) eK(x) ) dx 

n=l V xeQ n 

+ |L Vn v w | 2 +||I^„|| 2 ) (25) 
with the following definitions: 

fn = fn (<Pn) 
K = b n (<Pn) 

The images (f) and template (/x) are treated as continuous func- 
tions, but are actually encoded via B-spline basis functions. This 
is done to achieve continuous spatial gradients, which are needed 
for the registration. Velocity (v) and logarithms of inhomogeneity 
fields (b) are also treated as spatially continuous, and represented 
using tri-linear interpolation. 




FIGURE 1 | A graphical representation of the full model. Each of the N 
images (f) is assumed to be a deformed version of the template hi) scaled 
by a multiplicative inhomogeneity field [exp(b)] with additive Gaussian noise 
(precision X.). Each deformation is modeled by the composition of a 
rigid-body transform (£) parameterized by a vector of six parameters (q), 
and a diffeomorphic deformation (0) parameterized by its initial velocity (v). 



We now introduce a few more definitions that will be useful 
later. The mean image is needed for all steps, and is computed by 



/x 



EN -h' rr 

n=l w n e b nf n 



where 



w n (x) 



EN 9 
n=l w n 

\ n \D<p n (x)\e 2b n(x) if x £ Q' n 



0 



otherwise 



(26) 



(27) 



In addition, the following gradients are required for driving the 
image registration 



g = 



One other definition is 

a n = w„ {f' n e~ h '<< - • 



(28) 



(29) 



The derivatives used for updating the rigid-body and diffeo- 
morphic transformations are computed by substituting the above 
expressions. To compute the first derivatives, equations (28) and 
(29) would be substituted into equations (9) and (22). For the 
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approximate second derivatives, equations (27) and (28) would be 
substituted into equations (11) and (24). 

The main difficulty arises from combining the inhomogeneity 
estimation with the registration. Others have used an elegant meta- 
morphosis approach (Trouve and Younes, 2005) for integrating 
intensity variations with deformations, but this is not appropriate 
here. In the current model, intensity variations are assumed to be a 
function of the MR scanner rather than an intrinsic property of the 
brain itself. Therefore, the inhomogeneity fields are estimated in 
the space of the original images. Our registration scheme involves 



Algorithm 4 | Combined model estimation. 

for n = 1...N do 

Initialize q n , b n and v n . 
end for 
repeat 

for n = 1...N do 

Compute 0 V and D0 V fr 
(see Ashburner and Friston (2011) ) . 
end for 

Re-compute fx and g 
[equations (26) and (28)]. 
for n = 1...N do 

Compute |^ (equation 9, 
using equation 29) . 

Compute (equation 11, 

using equation 27) . 

Gauss-Newton update of q n 
(equation 8 ) . 
end for 
- . J_ y-N 

for n = 1...N do 

q« <- q« - q- 

end for 

Re-compute fi (equation 26) . 
for n = 1...N do 

Compute d£(b n ;h) (equation 30, using 
equation 29) . 

Compute d 2 £(b n ;hi>h2) (equation 31, 
using equation 27) . 

Gauss-Newton update of b n . 
end for 

Re-compute fx and g (equations 2 6 and 28) . 
for n = 1...N do 

Compute d£(v n ;h) (equation 22, 
using equation 27) . 

Compute <i 2 £(v w ;hi,Ii2) (equation 24, 
using equation 27) . 

Gauss-Newton update of v n . 
end for 

fi^Etl (4„L v „)v„. 
for n = 1...N do 

v n <r- y n - Ky n u . 
end for 
until convergence 



estimating mappings from the mean template to the individual 
scans. We can re-sample the original scans to bring them in align- 
ment with the template, but sampling the template to align it with 
the individual scans would require additionally computing the 
inverse deformation, adding an additional level of computational 
complexity. Our solution is to first compute derivatives of the data 
term in the space of the template, making use of the Jacobians of 
the deformations in a substitution of variables. 

Equations (14) and (15) may be re -written to incorporate 
equations (27) and (29). 

d£ (b n ; h) = I - \T>(p~ l (x) | (a n fi) o (p~ l (x) h (x) dx 

JxeQf n 

+ (4 L fe & «> /l ) ( 3 °) 

d 2 £ (b n ; hi, hi) ~ f \D<p- 1 (x)| (w nl x 2 ) o <p- 1 (x) 

JxeQf n 

/z 1 (x)/z 2 (x)^x + (4^/z 1 ,/z 2 ) (31) 

This enables the derivatives to be first computed in the 
space of the template, and subsequently pushed forward to the 
space of the original images. The inhomogeneity fields are then 
re-estimated by an iteration of Gauss-Newton, although they are 
not mean-corrected. 

The overall optimization scheme is shown in Algorithm 4. 

2.6. IMPLEMENTATION DETAILS 

Our implementation is written in a mixture of MATLAB and 
C code (mex files for the computationally expensive parts). For 
additional speed (and accuracy), the overall procedure is run over 
multiple spatial scales, beginning at the lowest resolution. At each 
scale, a solution is computed, which is prolonged to the next scale 
where it serves as a starting estimate for the next set of iterations 
at a higher resolution. 

The implementation has large memory requirements, which 
are likely to exceed the addressable memory of 3 2 -bit comput- 
ers. To save some memory, many of the computations are done 
using single precision floating point. Briefly, the main memory 
consumption comes from: 

• Image data (f n ). If each image contains / voxels, the memory 
for all images will be 4JN bytes. For example, a 256 x 256 x 256 
image (where /= 16777216) requires 64MB to represent it as 
single precision floating point. 

• Inhomogeneity fields (b n ) require 4JN bytes. 

• Template (/x) and its spatial gradients (g) require 4/^ + 12/ M 
bytes, where / M is the number of voxels in the template. 

• Velocity fields (v M ) require 12/^iV bytes. 

• Deformation fields (0 Vn ) require 12] bytes. 

• Jacobian fields (D0 Vn ) require 36/ M N bytes. 

The geodesic shooting step consumes a lot of additional mem- 
ory. This includes 36/ M bytes for the Fourier transform of the 
Greens function (K), 12/ M bytes for the initial momentum (u M ), 
plus 96/ M bytes for composing diffeomorphisms and their Jaco- 
bians. The maximum requirement at any point is just over 
144^ + ^(87 + 60^) bytes. 

The algorithm requires a few user- defined settings, which will 
now be outlined. 
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2.6.1. Noise estimates 

The model assumes that images can be closely aligned, such that 
most of the residual variance is scanner noise. One of the settings 
needed by the algorithm is an estimate of the noise variance of 
each scan (^-). This may be assigned by the user, although our 
implementation defaults to using values estimated from the MR 
scans themselves. 

Because MR images usually encode the magnitude of complex 
data, they have Rician noise. An estimate of the variance of this 
noise can be made by fitting a mixture of two Rician distributions 
to an intensity histogram from an image (see Figure 2). A reason- 
able scanner noise estimate may then be obtained from the smaller 
of the two estimated variances. The fitting procedure is similar to 
the expectation maximization for fitting a mixture of Gaussians, 
although the SNR fixed point formula (Koay and Basser, 2006) is 
used to compute the Rician parameters from the sample means 
and standard deviations. 

Noise drawn from a Rician distribution deviates most from 
Gaussian in regions of low signal intensity, where it is closer to 
the Rayleigh distribution (Rician with zero signal). Although the 
mean-squares difference noise model (which assumes residuals 
are drawn from a Gaussian distribution) differs from the Rician 
assumptions, it is probably close enough within the regions of the 
image in which we are mostly interested. Even in the worst case, the 
difference between noise drawn from two Rayleigh distributions 
is close to Gaussian. 

2.6.2. Differential operator for inhomogeneity 

Without regularization, all the differences among the images 
would be explained by the estimated inhomogeneity fields. There- 
fore, their estimation needs to involve regularization, which will 
depend on the nature of the artifacts present. If there is no 
inhomogeneity, then very strong regularization may be used, 
whereas less would be used in the presence of large artifacts. Our 




Image Intensity 

FIGURE 2 | A mixture of two Rician distributions fit to an MRI intensity 
histogram (shown dotted). The fit is shown as a continuous line, whereas 
dashed lines are used to show the two Rician distributions. 



implementation regularizes by minimizing the bending energy of 
the fields (\\L b b\\ 2 = co 0 f xeQ ||V 2 fr(x)|| dx, where coo is a hyper- 
parameter controlling the strength of the regularization), which 
gives generally smooth estimates. Neumann boundary conditions 
are imposed on b, such that the gradient is zero at the edge of the 
field of view. 

2.6.3. Differential operator for d iff eomorph isms 

The solution of any image registration problem is heavily depen- 
dent on the choice of differential operator used to regularize it. 
Although there are principled (Bayesian) ways to optimize the 
operator (Simpson et al., 201 1), such a strategy would be beyond 
the scope of the current work. Instead, a relatively ad hoc choice 
about the form of operator was made, although there were still 
some principled aspects involved. 

The differential operator involved no penalty against absolute 
displacements, which has implications for the Greens function 
(K v ) used by the geodesic shooting procedure. Such forms of reg- 
ularization operator can not be inverted exactly to obtain a unique 
K v . Our implementation of geodesic shooting used Fast Fourier 
Transform (FFT) methods to obtain the Greens function. Because 
the DC coefficient of the FFT of the differential operator is zero, 
its reciprocal is infinity. We therefore set the DC coefficient of 
the Greens function to zero, and let the rigid-body registration 
account for global translations. 

Many registration approaches use a Green's function that is 
simply a Gaussian. We chose to avoid this, as such functions privi- 
lege certain spatial scales above others (and also penalize absolute 
displacements). Some have instead used a K v consisting of a mix- 
ture of Gaussians to account for multiple spatial scales (Risser 
et al, 2011). Our approach involves using a combination of the 
linear- elasticity and bending energy (or thin-plate) models. 

H^ll 2 = I (°T ll Dv « + ( Dv ( x )) r | 2 + ^2tr(Dv(x)) 2 
+a) 3 |V 2 v(x)| 2 ) dx (32) 

Three hyper-parameters are involved: 

• co i controls the amount of stretching and shearing (but not 
rotation). 

• 0)2 controls the divergence, which in turn determines the 
amount of volumetric expansion and contraction. 

• C03 controls the bending energy. This ensures that the resulting 
velocity fields have smooth spatial derivatives. 

The Neumann boundary condition could not be used for the 
velocity fields, so these are assumed to be circular (the same as for 
a Fourier transform). 

2.6.4. Acquisition timing adjustments 

When there are only two scans to align, it is natural to encode the 
template at the point half-way between them. However, when the 
number of scans is greater than two, the choice of what time-point 
the template corresponds to is more arbitrary. If the regulariza- 
tion is the same for aligning all scans with the template, then 
the natural point is the average time of all scans. Generally, we 
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expect larger deformations for situations where the time interval 
between the template and scan is greater. Therefore, the regu- 
larization is adjusted for each scan so that it is (approximately) 
inversely proportional to the absolute time difference between the 
template and scan acquisition time. We also assume that the time 
for the template corresponds to the median of the acquisition 
times. Accounting for this requires the warping regularization to 
be adjusted, such that the penalty is in terms of an energy measure 
per unit of time. If the interval between template and scan is t n 
units (e.g., years), the penalty for the deformation is defined as 
\ ||Lv n v w | 2 = 1 1 LyV w 1 1 2 , where Ly encodes the penalty for one 
unit of time difference. 

2.6.5. Integrations 

The equations within the Methods section describe a continuous 
setting, whereas our actual implementation replaces the integra- 
tions over space with summations sampling the voxel centers. 
This sort of approximation is widely used for image registra- 
tion, although it probably accounts for much of the findings in 
Yushkevichet al. (2010). 

The integrations over time, which are used by the geodesic 
shooting procedure, also need to be discretized. These are cur- 
rently done using an Euler integration scheme, which uses three 
steps per unit of time difference, plus an additional two steps to 
account for some of the larger distortions usually found in the soft 
tissue outside the skull. Faster registration could be achieved using 
fewer time steps, although this may lead to decreased accuracy. 

3. RESULTS 

It is difficult to evaluate models when there is no ground truth 
available. Although in theory, Bayesian model comparisons could 
be performed, these would require computations that are not cur- 
rently feasible for very large models. Therefore, the evaluations are 
mostly anecdotal. 

3.1. EFFECTS OF REGULARIZATION 

The effects of regularizing diffeomorphisms using different 
choices of penalty is demonstrated using simulated 2D data. This 
involves the two images shown in Figure 3, which have dimensions 
of 256 x 128 pixels, and intensities ranging from 0 to 1. The areas 
of the circles and ellipses in the simulation were all approximately 
the same. These were registered together using the diffeomorphic 
framework - but without intensity inhomogeneity correction - 
using a variety of different forms of regularization. Results are 
shown in Figures 4 and 5. The first thing to notice is that the 
warped images all look relatively similar to each other, but the 
deformations (and their Jacobian determinants) differ markedly. 
The choice of regularization will play a significant role in any study 
where the aim is to localize volumetric differences. 

1. The first form of regularization (coi = 0.001, 002 = 0.001, and 
co 3 = 0.1, with an additional penalty on the square of absolute 
displacements of 0.0001) is intended to demonstrate the effect 
of using a Greens function that is approximately Gaussian. The 
behavior of such a kernel is such that the Jacobians of the result- 
ing deformations are more extreme. For real longitudinal data 
in Alzheimer's disease, where ventricles expand over time, it 























FIGURE 3 |The two simulated images. 


2. 


will give the impression that brain atrophy is localized to the 
regions close to the ventricles. 

The second form (co\= 0.001, 0)2 = 0.001, and 0)3 = 2.0) is 



dominated by a penalty against the bending energy of the defor- 
mations. For expanding ventricles, it may give the impression 
that brain tissue around the ventricles also expands. In gen- 
eral though, this form of regularization has a number of nice 
properties, which include scale invariance. 

3. The third form is dominated by the penalty against length 
changes. Unfortunately, within a continuous framework, the 
Greens function for this one is sharply peaked, such that the 
value at the center is infinity. 

4. The fourth form predominantly penalizes the divergence of the 
velocity fields, which tends to push the Jacobian determinants 
toward a value of one. In the bottom right of Figures 4 and 5, 
we see that the estimated volumetric differences are very small. 
The pure form of this regularization also has a Greens function 
with a singularity. 

The optimal form of regularization is likely to involve a com- 
bination of the above. Although liable to have a large impact on 
the accuracy of image registration algorithms, the neuroimaging 
literature contains little on the subject. 

3.2. REAL LONGITUDINAL MRI 

The algorithm was evaluated using data downloaded from Part 
1 of the OASIS Longitudinal MRI Data in Non-demented and 
Demented Older Adults 6 dataset (Marcus et al., 2010). This con- 
tained longitudinal scans from 82 subjects (from OAS2_0001 to 



6 Freely available from http://www.oasis-brains.org/ 
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FIGURE 4 | Warped simulated images. First column: upper image warped to 
match lower image (from Figure 3). Second column: deformation fields. Third 
column: logarithms of Jacobian determinants (color-scales are the same for all 
examples, and in the range of —3 to 3). First row: results from co^ =0.001, 



co 2 = 0.001 , and co 3 = 0.1 , with an additional penalty on the square of absolute 
displacements of 0.0001. Second row: results from co, = 0.001 , co 2 = 0.001 , 
and o) 3 = 2.0. Third row: results from co, = 0.05, co 2 = 0.0001 , and co 3 = 0.0001. 
Fourth row: results from — 0.001 , co 2 — 0.5, and co 3 = 0.001. 



OAS2_0099), each with data from between two and five time- 
points. Data from each time-point consisted of between two and 
five MRI scans, permitting improved signal to noise ratios via aver- 
aging. Further demographic information about the subjects may 
be obtained from the OASIS web site. 

The first step of the processing was to create averages of the 
scans from each time-point. Because subjects may move slightly, 
these averages were computed after a group-wise rigid-body align- 
ment. This was achieved using the estimated template from 
the group-wise registration - but with non-linear deformations 
disabled. 

The evaluations were based on averages over a number of 
scans, where the noise is no longer Rician. To simulate typical user 
behavior (default settings), they were done using noise estimates 
obtained by fitting Rician distributions to non-Rician noise. 

3.2.1. Pairwise symmetry 

The first test was to assess whether the procedure is actually inverse 
consistent if run pairwise. This simply involved aligning the first 
and second time-points of a pair of longitudinal images, and 



assessing whether the results were compatible with those from 
aligning the second and first. They were found to be exactly 
consistent. 

3.2.2. Anecdotal example 1 

The first illustration uses data from a 75-year-old (at first scan) 
right handed male with mild cognitive impairment (OAS2_0002, 
MMSE = 22, CDR = 0.5). Although there were three images for 
this subject, we just ran the algorithm using the first and last, 
which were collected 1869 days apart. The primary aim was to 
show the decrease in residual difference, after both inhomogene- 
ity correction and registration. This provides an indication of 
how well the registration works, but does not give the full story 
(Rohlfing, 2012) because some very implausible deformations may 
also greatly reduce the residuals. Jacobian determinant maps are 
also shown, which tell us about the plausibility of the volumetric 
changes involved. These results are shown in Figure 6, with more 
detail around the right hippocampus shown in Figure 7. 

The first thing to note is that the registration greatly reduced 
the residual difference. After registration, there is little remaining 
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FIGURE 5 | Warped simulated images. First column: lower image warped to 
match upper image. Second column: deformation fields. Third column: 
logarithms of Jacobian determinants (color-scales are the same for all 
examples, and in the range of -3 to 3). First row: results from co : =0.001, 



co 2 = 0.001 , and co z = 0. 1 , with an additional penalty on the square of absolute 
displacements of 0.0001. Second row: results from co : = 0.001 , co 2 = 0.001 , 
and a) 3 = 2.0. Third row: results from co } = 0.05, co 2 = 0.0001 , and oj 3 = 0.0001. 
Fourth row: results from co, — 0.001 , co 2 = 0.5, and co 3 = 0.001. 



structure to be seen in the residuals - particularly within the brain. 
We note also that the estimated Jacobian determinants seem to 
be plausible. A comparison between the residuals depicted in the 
bottom center and bottom left of the Figures shows the effect 
of the inhomogeneity correction, which also reduces the residual 
difference for these data. 

The brain is enclosed within the skull, so there is relatively little 
external influence on its shape. This may be contrasted with the 
soft tissue outside the skull, which shows extensive shape changes 
due to the subject's head positioning within the scanner. The Jaco- 
bian images (Figure 6) also show some artifacts along the right 
hand edge. These result from the image data wrapping around 
(see close to the right hand edge on the coronal view of the second 
time-point image). 

3.2.3. Anecdotal example 2 

The second illustration uses data from a 66-year-old male with 
dementia (OAS2_0048, MMSE= 19, CDR= 1). There were five 
scans for this subject, collected over a period of 1233 days. Rigidly 
aligned versions of the images are shown along the top of Figure 8, 



with the corresponding maps of expansion (divergence of initial 
velocity) shown below. Note that the expansion map of the mid- 
dle time-point is almost zero, as that point served as the reference 
time for the group -wise alignment. Careful examination of the 
divergence maps also reveals what appear to be artifactual volume 
changes for the more prominent blood vessels. This effect was 
found in many of the subjects. 

3.2.4. Principal components 

Group -wise longitudinal registration was run for all 82 subjects' 
data. The region within the cranium of each subject was identified 
by running the "new segment" algorithm of SPM8 (Ashburner and 
Friston, 2005) on their aligned mean images (/x), and summing 
the estimated gray matter, white matter and CSF maps together. 
Divergence values from inside the cranium of each subject were 
collected, from which N x N Gram matrices were computed 
and normalized by the number of voxels. The Gram matrices 
were decomposed via an eigen-decomposition and the largest 
eigenvalues identified. The corresponding eigenvectors, scaled by 
the square root of the eigenvalues, are plotted in Figure 9. 
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FIGURE 6 | Illustration of the results obtained from matching a pair of 
images of a subject with mild cognitive impairment, which were 
collected 1895 days apart (OAS2_0002). The three images of the residual 



difference shown along the bottom are all windowed the same. Black 
indicates a value of -500 or less, whereas white indicates values of 500 or 
above. 



In general, the plots for multiple time-points appeared fairly 
linear, although we do notice a steeper gradient between the first 
two points for some of the subjects. Although it appears to be some 
form of artifact, we do not yet have a good explanation for it. 

3.3. MEAN IMAGES 

Images of mean expansion rate were computed for each subject, 
by fitting voxel-wise linear models through the divergence maps. 

A mapping between each subject's gray matter, white matter, 
and CSF and the population mean of these tissues was estimated 
using Dartel (Ashburner, 2007; Ashburner and Friston, 2009). In 
addition, an affine mapping between the population mean and 
MNI space was also estimated. The compositions of these map- 
pings were then used to warp each subject's mean (fi) and their 
expansion rate images to MNI space 7 . 



7 We note that this is the approach typically adopted within the SPM software, and 
that parallel transport (Younes et al., 2008) should instead have been used to warp 
the velocity fields to MNI space, prior to computing their divergences. 



A simple average (not weighted by Jacobian determinants) of all 
the warped mean images was computed. Similarly, simple averages 
of the warped expansion rate maps for the control subjects and 
subjects with dementia were also computed. These averages are 
shown in Figure 10, and clearly show the pattern of atrophy typ- 
ically found in aging and dementia. Outside the brain, age associ- 
ated skin thickness decreases can also be seen (Shuster et al., 2006). 

4. DISCUSSION 

Research in biology is about much more than collecting p-values. 
Ultimately, we want to understand the mechanisms behind brain 
growth, development, aging, and various disease processes. Mech- 
anistic models require internal consistency and plausible under- 
lying assumptions. When internal consistency is not achieved, 
it is an indication that something is wrong. We demonstrated 
that for pairwise registration, our approach gives consistent 
solutions - irrespective of the order of the images. 

Mechanistic models should be based ideally on well accepted 
underlying assumptions, which for image registration are that 
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FIGURE 7 | Detail of the results obtained from matching a pair of images of a subject with mild cognitive impairment, which were collected 1895 days 
apart (OAS2_0002). 
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FIGURE 8 | Data from the time-points of a subject with dementia (OAS2_0048). The top row shows the original scans after rigid alignment, whereas the 
bottom row shows the divergence of the estimated velocity fields. 



lengths, areas, and volumes should never fall below zero, growth model is that these Jacobian determinants must be 
Relative volumes are computed from deformations via the positive. Achieving this requires a diffeomorphic deformation 
Jacobian determinants, so a necessary condition for a valid framework. 
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FIGURE 9 | Plots of main eigenvector from each subject's 
divergence maps within the cranium. The columns show plots from 
those control subjects who were scanned only twice, plots from 
control subjects who were scanned more than twice, plots from 
subjects with dementia who were scanned twice, and plots from 



subjects with dementia who were scanned more than twice. Dotted 
lines show the best linear fit. Note that the plots are sorted according 
to their average slope, which was done for easier visualization. Some of 
the eigenvectors were also rescaled by -1, such that all the gradients 
are positive. 



Some readers may object to the use of large deformation dif- 
feomorphic registration 8 approaches for modeling the relatively 
small longitudinal changes seen in aging. In principle though, a 
related framework would also be applicable to modeling growth 
and development of the brain - or indeed any other organ - 
from fetus through to adult. Small- deformation approximations 
would fail for these more extreme changes, whereas a diffeomor- 
phic approach could potentially model volumes and lengths that 
change by orders of magnitude. 

Further modifications to the current approach would be 
required to account for intensity changes that are intrinsic to the 
brain. For example, the brains of young infants have changing 
appearances throughout myelogenesis. Similarly, white matter 



8 The alternative is to use small deformation assumptions, which suppose that devi- 
ations from the identity transform can constructed by addition of displacements, 
rather than composing a series of tiny deformations. 



hypo-intensities, stroke, etc., may be more properly explained 
by intensity changes rather than shape changes, although it is 
not always entirely clear what solution is more appropriate. Oth- 
ers are beginning to develop models for simultaneous shape and 
appearance changes (Trouve and Younes, 2005; Hong et al., 2012), 
although there is still much more to be done. Alternatively, greater 
robustness may be achieved by using a matching term other than 
the L 2 norm, for which residual differences ideally follow a Gauss- 
ian distribution. A distribution with fatter tails, such as a mixture 
of Gaussians, may be more appropriate for modeling outliers 
(Penny etal, 2007). 

The effects of changing the form and magnitude of the reg- 
ularization are still relatively unexplored. It is likely that the 
optimal amount will depend on the ultimate objectives of a 
study. More regularization will decrease the noise in the estimated 
deformations - at the expense of introducing bias. Less regular- 
ization will decrease the bias, at the expense of fitting noise. If the 
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FIGURE 10 | Mean images. Left: average of all subjects' warped mean 
images. Center: average of the warped expansion rate maps of the control 
subjects. Right: average of the warped expansion rate maps of the 



subjects with dementia. Mean expansion rates are shown such that values 
of -0.04 or below are shown as black and values of 0.04 or above are 
shown as white. 



aim is to compute volume changes of brain structures - or even 
whole brains - it may be better to use less regularization because 
the noise will be averaged out over multiple voxels. However, if 
the aim is to make use of values in individual voxels, the optimal 
bias-variance tradeoff will be achieved with heavier regularization. 

We have presented a generative modeling framework for lon- 
gitudinal MRI, which combines rigid alignment, diffeomorphic 
warping and differential intensity non-uniformity correction with 
respect to a within -subject template that evolves to be an average 
with regard to all three of these aspects. The approach is symmetric 
and transitive by construction. In the pairwise case, it is not only 
inverse consistent, but the path (on the manifold of diffeomor- 
phisms) from one image to the other via the template is consistent 
with the direct geodesic path between the images. However, there 
is scope for further refinement of the model when dealing with 
images collected at more than two time-points. Such extensions 
would require ideas about variable rates of growth to be incorpo- 
rated (Fishbaugh et al, 201 1; Niethammer et al., 201 1; Trouve and 
Vialard, 2012) and will be investigated in future work. 
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APPENDIX 

A. DERIVATIVES FOR RIGID-BODY REGISTRATION 

To differentiate equation (5), we consider a single image and drop the n subscripts. 



L 



9 



-»(«.-')) 

dqi 



dx = k L f te 1 ) - f ) ( d ~H • ( (V/x) ° ) dx - (A1) 



Then a change of variables is incorporated to obtain the following: 



di;- 

Computing the above requires o § q . Inverting equation (4) gives: 



? q - 1 (x) = I 3> 4M^R- 1 M^j. 



^S_o| q j.(VM)]^x. 

' (A2) 



(A3) 



Using the identity d ^ - = — R 1 P R 1 to differentiate this, results in: 



This is combined with equation (4) to generate: 



— = -l3,4M; 1 R- 1 ^R ( 



(A4) 



3R„ 



°L = ~h 4 M~ 1 RT 1 — -M„ 

dqi Sq 3 ' q 9g; 



(A5) 



Gauss-Newton optimization requires a positive definite approximation to the matrix of second derivatives equation (11). The actual 
Hessian matrix is given by the following, but it is not guaranteed to be positive definite: 



dqidqj 2 J xeQ \ \ 4 // J xeQ 



'^M^M + u (r l ) -f) d2u ^ ' 

dqi dqj V V q / / dqidqj 



dx 



(A6) 



On average, \x ^§ q —f should be zero at the solution, so the second term can be omitted to obtain a positive definite approximation: 



M a a J dx=X ^.((V^ol- 1 ) ^.((V^o?- 1 ) rfx (A7) 



Finally, a change of variables gives: 



i L |D5 '! ( °4 (vm) ) ((t£ - f « 1 <7w> 1 * (As) 



B. TEMPLATE GRADIENTS 

The gradient in equation (23) is not the same as the spatial gradient of the template image (/x), which would be computed via 







V(/„o0 v „)+^ = 


, (f„o0 v JV|D0 v J 
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T=0 



dx 



T=0 



The above expression incorporates the gradient of the Jacobian determinants, which would have a detrimental effect on the 
registration (see Appendix C). The Gateaux variation of the matching term, with respect to variations in the initial velocity, is 

d X f 

-j- "f / | D 0v„ ( x ) I (fit ° 0v„ ( x ) - M (x - rh (x))) 2 dx 
dr 2 J xeQii 

^ 2 tea, \ Ez=i X;|D0 V) (x)| ) 

= k„ ( |D0 V(> (x) I (/„ o 4> Vn (x) - /i(x)) g (x) • h (x) dx (A10) 

where /x and g are computed as in equations (21) and (23). A similar scheme maybe used to derive the positive definite approximation 
to the second Gateaux variation. 

For the software implementation, the gradients of the warped images are computed by sampling the image and its gradients according 
to the transformation, and multiplying the gradients by the transpose of the Jacobian tensor at each point. 

V (f n o 0 V J = (D0 v J T ((V/ n ) o 0 V J (All) 

C. PAIRWISE SYMMETRY 

Pairwise symmetric registration is a special case of the group -wise formulation, and is of interest to many. We note that for pairwise 
registration (i.e., N = 2), where we define v = vi = — V2 (and the regularization term has been halved for convenience), the objective 
function reduces to 



e l ( MA 2 |D0 v (x)||D0_ v (x)| u . f . , ,x2, 1 ||T , 

S= i / I 1^ M | ■ , 1^. —t (/io0 v (x)-/ 2 o0_ v (x)) dx+dlLvVl 
2 JxeQu A.i D0 V (x) + X 2 D0_ v (x) 2 



2 



2 JxeQ^ A,i |D0 V (x)| + X 2 |D0_ V (x)| 
The solution, where derivatives of the objective function with respect to variations in v are zero, satisfies the following: 



(A12) 



X X X 2 |D0 V | |D0_ V | (Aq |D0 V | V (fi o 0 V ) + X 2 |D0_ V | V (f 2 o 0_ v )) (f 2 o 0_ v -ft o 0 V ) _ t T 

...... 2 — LyLyV. (A13J 

(A 1 |D0 v |+A 2 |D0_ v |) 2 

For the geodesic shooting in approach to work, we need to consider the solution along the entire trajectory of the diffeomorphism. 
To do this, we introduce another diffeomorphic mapping, £ , which is used to assess the effect of determining the initial velocities at 
some point other than the mid-point. For the geodesic shooting in equation (19) to work properly, the left-hand side of equation 
(45) (which we here refer to as momentum, u) must become ID£(D£) T (u o f )l if we replace 0 V and 0_ v with 0 V o £ and 0_ v o £ 
respectively. For typesetting reasons, we decompose the left-hand side of equation (45) into two factors and consider what happens to 
each of them if we make those substitutions. 

For the first factor [corresponding to a M (x) in equation 22], using the identity IP(0 o f )l = IDf l(ID0l o f ) we obtain 

A!A 2 |D(0 v o£)| |D (0_ v o{)|(/io0 Y o;-/ 2 o 0 _ v o g) = / X\X 2 |D0 V | |D0_ V | (/j o 0 V - f 2 o 0_ V/ 



Ai|D(0 v o£)|+A 2 |D(0_ v o£)| 

For the second factor [corresponding to g(x) in equation 22], we in addition use the element-wise matrix multiplication 
V (f o £ ) = (Df ) T (Vf ) o £ to obtain 



Xi |D(0 ¥ o<)| V(/io0 ¥ o<)+X 2 | 


D(0_ v o<)|V(/ 2 o0_ v o<) 




D(0v °<)| +X 2 


|D(0_ V °<)| 



Xi 


D0 v |V(/io0 v )+X 2 |D0_ v | 


V(/ 2 o^_ v ) 


Xi 


D0 V | +X 2 D0_ v 





= i 1 1 : . : ^ i 

(A15) 

By recombining the factors, we see that symmetry is achieved, while still satisfying the requirements of a geodesic shooting approach 
based on the EPdf/fequation (Euler-Poincare equation on diffeomorphisms). 

This symmetric registration approach is very similar to that of Hart et al. (2009), Niethammer et al. (2009), although others have 
proposed different strategies for achieving inverse consistency. Previously Beg and Khan (2007) presented two adjustments to enforce 
inverse consistency in the large deformation diffeomorphic metric mapping (LDDMM) approach. The first of these was based on align- 
ing an image pair to their simple half-way average. It is similar to the approach of Avants et al. (2008), as well as one of the approaches 
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proposed in Younes (2007). Unfortunately, it introduces a discontinuity in the evolution equations at the mid-point (Bruveris et al., 
2011). The second approach of Beg and Khan (2007) does not introduce this discontinuity, but leads to more complicated evolution 
equations that do not strictly obey EPdiff. Similarly, the other strategy for achieving inverse consistency proposed in Younes (2007) also 
does not obey these equations. 

We note that our matching term in equation (A12) does not satisfy all the desiderata set out by Tagare et al. (2009). In particular, we 
see that for aligning two constant images, this term will depend on the deformations. However, it is worth noting that the derivatives of 
the matching term with respect to velocity (see equation A13) are zero for constant images, so there should not be any tendency toward 
favoring any particular solution. 
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